This markdown imports the processed data and conducts the exploratory data analysis. Follow the processing markdowns in the processing_code folder to generate the processed data.
This analysis examines the predictors of allocation of FEMA funds during disaster declarations. In terms of the datasets, the two outcomes of interest are Requested Amount (ReqAmt) and Obligated Amount (OblAmt).
For each variable, the following steps will be completed:
The steps will be numbered for each variable as specified above.
The following R packages are required for this script:
Load the data generated from the markdowns in the processing_code folder.
#define file path
data_location <- here::here("data", "processed_data", "combinedprocessed.rds")
#load data
EDAdata <- readRDS(data_location)
To better understand the data, let’s use summarytools to better visualize the data.
summarytools::dfSummary(EDAdata)
## Data Frame Summary
## EDAdata
## Dimensions: 446 x 21
## Duplicates: 0
##
## ----------------------------------------------------------------------------------------------------------------------------
## No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
## ---- ------------------ ----------------------------------- --------------------- --------------------- ---------- ---------
## 1 state 1. LA 22 ( 4.9%) 446 0
## [character] 2. TX 21 ( 4.7%) (100.0%) (0.0%)
## 3. CA 20 ( 4.5%)
## 4. FL 18 ( 4.0%)
## 5. WV 16 ( 3.6%)
## 6. MS 15 ( 3.4%)
## 7. NC 14 ( 3.1%)
## 8. GA 13 ( 2.9%)
## 9. SC 13 ( 2.9%)
## 10. OK 12 ( 2.7%)
## [ 41 others ] 282 (63.2%) IIIIIIIIIIII
##
## 2 disasterNumber Mean (sd) : 4100.9 (418.7) 445 distinct values : . : . 446 0
## [integer] min < med < max: : : : : : (100.0%) (0.0%)
## 3345 < 4237.5 < 4624 : : : : :
## IQR (CV) : 859.2 (0.1) : : : : : : :
## : : : : : : :
##
## 3 IncidentYear Mean (sd) : 2017 (2.9) 2012 : 43 ( 9.6%) I 446 0
## [numeric] min < med < max: 2013 : 41 ( 9.2%) I (100.0%) (0.0%)
## 2012 < 2018 < 2021 2014 : 26 ( 5.8%) I
## IQR (CV) : 5 (0) 2015 : 37 ( 8.3%) I
## 2016 : 27 ( 6.1%) I
## 2017 : 48 (10.8%) II
## 2018 : 42 ( 9.4%) I
## 2019 : 41 ( 9.2%) I
## 2020 : 115 (25.8%) IIIII
## 2021 : 26 ( 5.8%) I
##
## 4 IncidentMonth Mean (sd) : 5.6 (3.6) 12 distinct values : 446 0
## [numeric] min < med < max: : (100.0%) (0.0%)
## 1 < 6 < 12 :
## IQR (CV) : 7 (0.6) : . . . : :
## : : : : : . : : : :
##
## 5 declarationType 1. DR 330 (74.0%) IIIIIIIIIIIIII 446 0
## [character] 2. EM 116 (26.0%) IIIII (100.0%) (0.0%)
##
## 6 incidentType 1. Severe Storm(s) 117 (26.2%) IIIII 446 0
## [character] 2. Hurricane 105 (23.5%) IIII (100.0%) (0.0%)
## 3. Biological 78 (17.5%) III
## 4. Flood 75 (16.8%) III
## 5. Fire 22 ( 4.9%)
## 6. Severe Ice Storm 12 ( 2.7%)
## 7. Tornado 11 ( 2.5%)
## 8. Earthquake 5 ( 1.1%)
## 9. Snow 5 ( 1.1%)
## 10. Coastal Storm 4 ( 0.9%)
## [ 6 others ] 12 ( 2.7%)
##
## 7 declarationTitle 1. COVID-19 PANDEMIC 52 (11.7%) II 446 0
## [character] 2. SEVERE STORMS AND FLOODIN 38 ( 8.5%) I (100.0%) (0.0%)
## 3. SEVERE STORMS, TORNADOES, 33 ( 7.4%) I
## 4. COVID-19 26 ( 5.8%) I
## 5. HURRICANE SANDY 17 ( 3.8%)
## 6. SEVERE STORMS, FLOODING, 16 ( 3.6%)
## 7. SEVERE STORMS, TORNADOES, 14 ( 3.1%)
## 8. SEVERE WINTER STORM 13 ( 2.9%)
## 9. WILDFIRES 12 ( 2.7%)
## 10. HURRICANE IRMA 10 ( 2.2%)
## [ 105 others ] 215 (48.2%) IIIIIIIII
##
## 8 IncidentDuration Mean (sd) : 128.2 (246) 70 distinct values : 446 0
## [numeric] min < med < max: : (100.0%) (0.0%)
## 0 < 12 < 660 :
## IQR (CV) : 34.7 (1.9) :
## : :
##
## 9 ResponseDays Mean (sd) : 1332.2 (914.4) 327 distinct values : 446 0
## [numeric] min < med < max: . : (100.0%) (0.0%)
## 70 < 1098 < 3588 : : . .
## IQR (CV) : 1358.2 (0.7) : : : : : .
## : : : : : : :
##
## 10 IH Min : 0 0 : 267 (59.9%) IIIIIIIIIII 446 0
## [numeric] Mean : 0.4 1 : 179 (40.1%) IIIIIIII (100.0%) (0.0%)
## Max : 1
##
## 11 IH_pct Mean (sd) : 0.2 (1.4) 147 distinct values : 446 0
## [numeric] min < med < max: : (100.0%) (0.0%)
## 0 < 0 < 27.7 :
## IQR (CV) : 0.1 (5.5) :
## :
##
## 12 PA Min : 0 0 : 12 ( 2.7%) 446 0
## [numeric] Mean : 1 1 : 434 (97.3%) IIIIIIIIIIIIIIIIIII (100.0%) (0.0%)
## Max : 1
##
## 13 PA_pct Mean (sd) : 0.6 (1.3) 284 distinct values : 446 0
## [numeric] min < med < max: : (100.0%) (0.0%)
## 0 < 0.3 < 27.3 :
## IQR (CV) : 0.9 (2.4) :
## :
##
## 14 HM Min : 0 0 : 117 (26.2%) IIIII 446 0
## [numeric] Mean : 0.7 1 : 329 (73.8%) IIIIIIIIIIIIII (100.0%) (0.0%)
## Max : 1
##
## 15 HM_pct Mean (sd) : 0.4 (1.3) 256 distinct values : 446 0
## [numeric] min < med < max: : (100.0%) (0.0%)
## 0 < 0.2 < 27.2 :
## IQR (CV) : 0.4 (3.5) :
## :
##
## 16 Population 1. 4,657,757 22 ( 4.9%) 446 0
## [character] 2. 29,145,505 21 ( 4.7%) (100.0%) (0.0%)
## 3. 39,538,223 20 ( 4.5%)
## 4. 21,538,187 18 ( 4.0%)
## 5. 1,793,716 16 ( 3.6%)
## 6. 2,961,279 15 ( 3.4%)
## 7. 10,439,388 14 ( 3.1%)
## 8. 10,711,908 13 ( 2.9%)
## 9. 5,118,425 13 ( 2.9%)
## 10. 3,959,353 12 ( 2.7%)
## [ 41 others ] 282 (63.2%) IIIIIIIIIIII
##
## 17 Counties Mean (sd) : 74.8 (52.3) 46 distinct values : 446 0
## [integer] min < med < max: : : (100.0%) (0.0%)
## 1 < 67 < 254 : :
## IQR (CV) : 46.8 (0.7) : : : :
## : : : : . . .
##
## 18 TotalAgencies Mean (sd) : 4.9 (6.6) 31 distinct values : 446 0
## [numeric] min < med < max: : (100.0%) (0.0%)
## 0 < 2 < 55 :
## IQR (CV) : 6 (1.3) :
## : : . .
##
## 19 ReqAmt Mean (sd) : 29673687 (218785540) 332 distinct values : 446 0
## [numeric] min < med < max: : (100.0%) (0.0%)
## -855200.3 < 109471.6 < 4388540486 :
## IQR (CV) : 1467763 (7.4) :
## :
##
## 20 OblAmt Mean (sd) : 29744109 (219676788) 332 distinct values : 446 0
## [numeric] min < med < max: : (100.0%) (0.0%)
## -855200.3 < 109471.6 < 4408424297 :
## IQR (CV) : 1470450 (7.4) :
## :
##
## 21 FEMACSAvg Mean (sd) : 0.8 (0.4) 137 distinct values : 446 0
## [numeric] min < med < max: : (100.0%) (0.0%)
## 0 < 1 < 1 :
## IQR (CV) : 0.1 (0.4) :
## : . . :
## ----------------------------------------------------------------------------------------------------------------------------
Looking at the output of summarytools::dfsummary()
OblAmt ranges from -$855,200.3 to $4,408,424,297ReqAmt ranges from -$855,200.3 to $4,388,540,486Let’s dive a little deeper into each variable.
The first of two primary outcomes of interest is the total amount of funds the state requested from the federal government for a disaster declaration.
#summary statistics
base::summary(EDAdata$ReqAmt)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -855200 0 109472 29673687 1467763 4388540486
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = ReqAmt)) +
geom_density()
#try a log transformation out of curiosity
#this does ignore all negative values
ggplot2::ggplot(data = EDAdata, aes(x = log(ReqAmt))) +
geom_density()
## Warning in log(ReqAmt): NaNs produced
## Warning in log(ReqAmt): NaNs produced
## Warning: Removed 120 rows containing non-finite values (stat_density).
#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = ReqAmt)) +
geom_boxplot()
#crop to $1M to get a better idea
ggplot2::ggplot(data = EDAdata, aes(y = ReqAmt)) +
geom_boxplot() +
ylim(0, 1000000)
## Warning: Removed 138 rows containing non-finite values (stat_boxplot).
#log transformed boxplot
ggplot2::ggplot(data = EDAdata, aes(y = log(ReqAmt))) +
geom_boxplot()
## Warning in log(ReqAmt): NaNs produced
## Warning in log(ReqAmt): NaNs produced
## Warning: Removed 120 rows containing non-finite values (stat_boxplot).
##QQ plot
car::qqPlot(EDAdata$ReqAmt)
## [1] 328 273
These outputs show the requested amounts do not follow a normal distribution, which is to be expected given the broad range of disasters. We may need to remove outliers in the analysis, but for now, we’ll leave them to capture the true nature of disaster declarations. The largest outlier is likely the COVID-19 Pandemic. The log transformation does a better job of showing the distribution of the data, so we may need to consider an exponential model in the analysis; however, log(negative number) is undefined, so we would be ignoring all reimbursement data. The transformed and untransformed figures suggest there are significant tails in the data.
The alternative outcome of interest is the funding obligated from FEMA. While this may be the same as ReqAmt (especially for large-scale disasters), FEMA doesn’t always pay the amount the states request.
#summary statistics
base::summary(EDAdata$OblAmt)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -855200 0 109472 29744109 1470450 4408424297
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = OblAmt)) +
geom_density()
#try a log transformation out of curiosity
ggplot2::ggplot(data = EDAdata, aes(x = log(OblAmt))) +
geom_density()
## Warning in log(OblAmt): NaNs produced
## Warning in log(OblAmt): NaNs produced
## Warning: Removed 120 rows containing non-finite values (stat_density).
#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = OblAmt)) +
geom_boxplot()
#crop to $1M to get a better idea
ggplot2::ggplot(data = EDAdata, aes(y = OblAmt)) +
geom_boxplot() +
ylim(0, 1000000)
## Warning: Removed 139 rows containing non-finite values (stat_boxplot).
#log transformed boxplot
ggplot2::ggplot(data = EDAdata, aes(y = log(OblAmt))) +
geom_boxplot()
## Warning in log(OblAmt): NaNs produced
## Warning in log(OblAmt): NaNs produced
## Warning: Removed 120 rows containing non-finite values (stat_boxplot).
##QQ plot
car::qqPlot(EDAdata$OblAmt)
## [1] 328 273
It appears that obligated funds and requested funds have similar distributions and characteristics. The same commentary about the requested funds analysis applies here as well.
#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$state, report.nas = FALSE)
## Frequencies
## EDAdata$state
## Type: Character
##
## Freq % % Cum.
## ----------- ------ -------- --------
## AK 8 1.79 1.79
## AL 10 2.24 4.04
## AR 10 2.24 6.28
## AZ 3 0.67 6.95
## CA 20 4.48 11.43
## CO 6 1.35 12.78
## CT 9 2.02 14.80
## DC 3 0.67 15.47
## DE 3 0.67 16.14
## FL 18 4.04 20.18
## GA 13 2.91 23.09
## HI 7 1.57 24.66
## IA 10 2.24 26.91
## ID 6 1.35 28.25
## IL 5 1.12 29.37
## IN 3 0.67 30.04
## KS 7 1.57 31.61
## KY 7 1.57 33.18
## LA 22 4.93 38.12
## MA 6 1.35 39.46
## ME 3 0.67 40.13
## MI 7 1.57 41.70
## MN 10 2.24 43.95
## MO 8 1.79 45.74
## MS 15 3.36 49.10
## MT 2 0.45 49.55
## NC 14 3.14 52.69
## ND 6 1.35 54.04
## NE 9 2.02 56.05
## NH 7 1.57 57.62
## NJ 5 1.12 58.74
## NM 6 1.35 60.09
## NV 3 0.67 60.76
## NY 11 2.47 63.23
## OH 7 1.57 64.80
## OK 12 2.69 67.49
## OR 10 2.24 69.73
## PA 8 1.79 71.52
## PR 11 2.47 73.99
## RI 6 1.35 75.34
## SC 13 2.91 78.25
## SD 11 2.47 80.72
## TN 11 2.47 83.18
## TX 21 4.71 87.89
## UT 1 0.22 88.12
## VA 7 1.57 89.69
## VT 6 1.35 91.03
## WA 12 2.69 93.72
## WI 9 2.02 95.74
## WV 16 3.59 99.33
## WY 3 0.67 100.00
## Total 446 100.00 100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#filter top 5 states
#start with obligated amount
obl_fig_state <- EDAdata %>%
dplyr::add_count(state) %>%
dplyr::filter(dplyr::dense_rank(-n) < 6) %>%
ggplot2::ggplot(aes(x = state, y = OblAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
scale_y_continuous(labels = scales::dollar_format()) +
labs(x = "State or Territory",
y = "Obligated FEMA Funds",
title = "FEMA funds obligated to five most disaster-prone states")
obl_fig_state
#save file
obl_state_fig_file = here("results","obl-state.png")
ggsave(filename = obl_state_fig_file, plot = obl_fig_state)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_state <- EDAdata %>%
dplyr::add_count(state) %>%
dplyr::filter(dplyr::dense_rank(-n) < 6) %>%
ggplot2::ggplot(aes(x = state, y = ReqAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
scale_y_continuous(labels = scales::dollar_format()) +
labs(x = "State or Territory",
y = "Requested FEMA Funds",
title = "FEMA funds requested by five most disaster-prone states")
req_fig_state
#save file
req_state_fig_file = here("results","req-state.png")
ggsave(filename = req_state_fig_file, plot = req_fig_state)
## Saving 7 x 5 in image
#create a table where columns represent requested and obligated funds and rows are 5 most disaster-prone states
decs_state_5 <- EDAdata %>%
dplyr::add_count(state) %>%
dplyr::filter(dense_rank(-n) < 6) %>%
gtsummary::select(state, ReqAmt, OblAmt) %>%
gtsummary::tbl_summary(
by = state,
label = list(state ~ "State or Territory",
ReqAmt ~ "Requested Funds",
OblAmt ~ "Obligated Funds"),
statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
gtsummary::modify_header(label ~ "**Variable**") %>%
gtsummary::modify_spanning_header(c("stat_1", "stat_2", "stat_3", "stat_4", "stat_5") ~ "**State**") %>%
gtsummary::modify_caption("**Table 4. FEMA Funding for 5 Most Disaster-Prone States 2012 - 2021**") %>%
gtsummary::bold_labels()
decs_state_5
| Variable | State | ||||
|---|---|---|---|---|---|
| CA, N = 201 | FL, N = 181 | LA, N = 221 | TX, N = 211 | WV, N = 161 | |
| Requested Funds | 62,605,465 (169,900,730) | 27,337,218 (68,075,813) | 34,318,344 (90,594,708) | 36,586,287 (119,143,196) | 6,659,588 (25,471,101) |
| Obligated Funds | 62,605,465 (169,900,730) | 27,337,218 (68,075,813) | 34,318,344 (90,594,708) | 36,637,886 (119,167,108) | 6,660,260 (25,470,955) |
|
1
Mean (SD)
|
|||||
#save file
decs_state_file = here("results", "decs-state-5.Rds")
saveRDS(decs_state_5, file = decs_state_file)
#summary statistics since it's continuous
base::summary(EDAdata$IncidentYear)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2012 2015 2018 2017 2020 2021
#examine graphical relationship with outcomes
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = IncidentYear)) +
geom_density()
#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = IncidentYear)) +
geom_boxplot()
##QQ plot
car::qqPlot(EDAdata$IncidentYear)
## [1] 8 11
#histogram distribution
year_hist <- EDAdata %>%
ggplot2::ggplot(aes(x=IncidentYear)) +
geom_bar(fill="steelblue4") +
scale_x_continuous(breaks = seq(2012, 2021, by = 1)) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 125)) +
geom_text(stat='count', aes(label=..count..), vjust=-1) +
labs(x = "Incident Year",
y = "Number of Declarations",
title = "Disaster Declarations per Year (2012 - 2021)")
year_hist
#save file
year_fig_file = here("results","incidentyear.png")
ggsave(filename = year_fig_file, plot = year_hist)
## Saving 7 x 5 in image
#examine graphical relationship with outcomes of interest
#look at total cost per incident year
req_fig_year <- EDAdata %>%
ggplot2::ggplot(aes(x = IncidentYear)) +
geom_bar(aes(y = ReqAmt),
stat = "identity",
fill = "tomato") +
scale_y_continuous(expand = c(0, 0),
labels = scales::dollar_format()) +
scale_x_continuous(breaks = seq(2012, 2021, by = 1)) +
labs(x = "Incident Year",
y = "Funds ($)",
subtitle = "Requested")
req_fig_year
#repeat for obligated funds
obl_fig_year <- EDAdata %>%
ggplot2::ggplot(aes(x = IncidentYear)) +
geom_bar(aes(y = OblAmt),
stat = "identity",
fill = "turquoise4") +
scale_y_continuous(expand = c(0, 0),
labels = scales::dollar_format()) +
scale_x_continuous(breaks = seq(2012, 2021, by = 1)) +
labs(x = "Incident Year",
y = "Funds ($)",
subtitle = "Obligated")
obl_fig_year
#combine into one plot
plot_col <- cowplot::plot_grid(req_fig_year,
obl_fig_year,
ncol = 1,
align = "v",
rel_heights = c(0.5,0.5))
plot_col
title1 <- cowplot::ggdraw() +
draw_label(
"Requested vs obligated FEMA funds per year",
fontface = 'bold',
x = 0,
hjust = 0) +
theme( plot.margin = margin(0, 0, 0, 7))
funds_fig_year <- cowplot::plot_grid(
title1, plot_col,
ncol = 1,
# rel_heights values control vertical title margins
rel_heights = c(0.1, 1))
funds_fig_year
#save file
funds_year_fig_file = here("results","funds-years.png")
ggsave(filename = funds_year_fig_file, plot = funds_fig_year)
## Saving 7 x 5 in image
#create a table where columns represent requested and obligated funds and rows are 5 most disaster-prone states
decs_year_3 <- EDAdata %>%
dplyr::add_count(IncidentYear) %>%
dplyr::filter(dense_rank(-n) < 4) %>%
gtsummary::select(IncidentYear, ReqAmt, OblAmt) %>%
gtsummary::tbl_summary(
by = IncidentYear,
label = list(IncidentYear ~ "Year",
ReqAmt ~ "Requested Funds",
OblAmt ~ "Obligated Funds"),
statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
gtsummary::modify_header(label ~ "**Variable**") %>%
gtsummary::modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Year**") %>%
gtsummary::modify_caption("**Table 2. FEMA Funding for the 3 Costliest Years (2012 - 2021)**") %>%
gtsummary::bold_labels()
decs_year_3
| Variable | Year | ||
|---|---|---|---|
| 2012, N = 431 | 2017, N = 481 | 2020, N = 1151 | |
| Requested Funds | 8,523,455 (49,075,292) | 116,482,973 (638,527,353) | 54,834,301 (104,214,883) |
| Obligated Funds | 8,740,375 (49,166,434) | 116,915,384 (641,357,763) | 54,838,235 (104,217,430) |
|
1
Mean (SD)
|
|||
#save file
decs_year_file = here("results", "decs-year-3.Rds")
saveRDS(decs_year_3, file = decs_year_file)
#summary statistics since it's continuous
base::summary(EDAdata$IncidentMonth)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 6.000 5.617 9.000 12.000
#examine graphical relationship with outcomes
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = IncidentMonth)) +
geom_density()
#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = IncidentMonth)) +
geom_boxplot()
##QQ plot
car::qqPlot(EDAdata$IncidentMonth)
## [1] 16 23
#histogram distribution
month_hist <- EDAdata %>%
ggplot2::ggplot(aes(x=IncidentMonth)) +
geom_bar(fill="darkseagreen4") +
scale_x_continuous(breaks = seq(1, 12, by = 1),
labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sept", "Oct", "Nov", "Dec")) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 110)) +
geom_text(stat='count', aes(label=..count..), vjust=-1) +
labs(x = "Declaration Month",
y = "Number of Declarations",
title = "Disaster Declarations per Month (2012 - 2021)")
month_hist
#save file
month_fig_file = here("results","incidentmonth.png")
ggsave(filename = month_fig_file, plot = month_hist)
## Saving 7 x 5 in image
#examine graphical relationship with outcomes of interest
#look at total cost per incident month
req_fig_month <- EDAdata %>%
ggplot2::ggplot(aes(x = IncidentMonth)) +
geom_bar(aes(y = ReqAmt),
stat = "identity",
fill = "tomato") +
scale_y_continuous(expand = c(0, 0),
labels = scales::dollar_format()) +
scale_x_continuous(breaks = seq(1, 12, by = 1),
labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sept", "Oct", "Nov", "Dec")) +
labs(x = "Declaration Month",
y = "Funds ($)",
subtitle = "Requested")
req_fig_month
#repeat for obligated funds
obl_fig_month <- EDAdata %>%
ggplot2::ggplot(aes(x = IncidentMonth)) +
geom_bar(aes(y = OblAmt),
stat = "identity",
fill = "turquoise4") +
scale_y_continuous(expand = c(0, 0),
labels = scales::dollar_format()) +
scale_x_continuous(breaks = seq(1, 12, by = 1),
labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sept", "Oct", "Nov", "Dec")) +
labs(x = "Declaration Month",
y = "Funds ($)",
subtitle = "Obligated")
obl_fig_month
#combine into one plot
plot_col1 <- cowplot::plot_grid(req_fig_month,
obl_fig_month,
ncol = 1,
align = "v",
rel_heights = c(0.5,0.5))
plot_col1
title2 <- cowplot::ggdraw() +
draw_label(
"Requested vs obligated FEMA funds by declaration month",
fontface = 'bold',
x = 0,
hjust = 0) +
theme( plot.margin = margin(0, 0, 0, 7))
funds_fig_month <- cowplot::plot_grid(
title2, plot_col1,
ncol = 1,
# rel_heights values control vertical title margins
rel_heights = c(0.1, 1))
funds_fig_month
#save file
funds_month_fig_file = here("results","funds-month.png")
ggsave(filename = funds_month_fig_file, plot = funds_fig_month)
## Saving 7 x 5 in image
#create a table where columns represent requested and obligated funds and rows are 3 most distaster prone-months
decs_month_3 <- EDAdata %>%
dplyr::add_count(IncidentMonth) %>%
dplyr::filter(dense_rank(-n) < 4) %>%
gtsummary::select(IncidentMonth, ReqAmt, OblAmt) %>%
gtsummary::tbl_summary(
by = IncidentMonth,
label = list(IncidentMonth ~ "Month",
ReqAmt ~ "Requested Funds",
OblAmt ~ "Obligated Funds"),
statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
gtsummary::modify_header(label ~ "**Variable**") %>%
gtsummary::modify_header(update = list(
stat_1 ~ "**Jan**",
stat_2 ~ "**Sept**",
stat_3 ~ "**Oct**"
)) %>%
gtsummary::modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Month**") %>%
gtsummary::modify_caption("**Table 3. FEMA Funding for the 3 Costliest Months (2012 - 2021)**") %>%
gtsummary::bold_labels()
decs_month_3
| Variable | Month | ||
|---|---|---|---|
| Jan1 | Sept1 | Oct1 | |
| Requested Funds | 63,229,783 (111,136,115) | 100,831,890 (639,779,995) | 22,992,573 (101,460,411) |
| Obligated Funds | 63,239,449 (111,135,744) | 101,252,822 (642,677,256) | 23,165,582 (101,470,263) |
|
1
Mean (SD)
|
|||
#save file
decs_month_file = here("results", "decs-month-3.Rds")
saveRDS(decs_month_3, file = decs_month_file)
#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$declarationType, report.nas = FALSE)
## Frequencies
## EDAdata$declarationType
## Type: Character
##
## Freq % % Cum.
## ----------- ------ -------- --------
## DR 330 73.99 73.99
## EM 116 26.01 100.00
## Total 446 100.00 100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_DT <- EDAdata %>%
ggplot2::ggplot(aes(x = declarationType, y = OblAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_discrete(labels = c("Major Disaster Declaration", "Emergency Declaration")) +
labs(x = "Declaration Type",
y = "Obligated FEMA Funds",
title = "FEMA funds obligated for declaration types")
obl_fig_DT
#save file
obl_DT_fig_file = here("results","obl-DT.png")
ggsave(filename = obl_DT_fig_file, plot = obl_fig_DT)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_DT <- EDAdata %>%
ggplot2::ggplot(aes(x = declarationType, y = ReqAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_discrete(labels = c("Major Disaster Declaration", "Emergency Declaration")) +
labs(x = "Declaration Type",
y = "Requested FEMA Funds",
title = "FEMA funds requested for declaration types")
req_fig_DT
#save file
req_DT_fig_file = here("results","req-DT.png")
ggsave(filename = req_DT_fig_file, plot = req_fig_DT)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_dtype <- EDAdata %>%
gtsummary::select(declarationType, ReqAmt, OblAmt) %>%
gtsummary::tbl_summary(
by = declarationType,
label = list(declarationType ~ "Type",
ReqAmt ~ "Requested Funds",
OblAmt ~ "Obligated Funds"),
statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
gtsummary::modify_header(label ~ "**Variable**") %>%
gtsummary::modify_header(update = list(
stat_1 ~ "**Major Disaster**",
stat_2 ~ "**Emergency**")) %>%
gtsummary::modify_spanning_header(c("stat_1", "stat_2") ~ "**Type**") %>%
gtsummary::modify_caption("**Table 4. FEMA Funding for Declaration Types 2012 - 2021**") %>%
gtsummary::bold_labels()
decs_dtype
| Variable | Type | |
|---|---|---|
| Major Disaster1 | Emergency1 | |
| Requested Funds | 40,010,548 (253,637,660) | 267,101 (625,898) |
| Obligated Funds | 40,105,724 (254,673,610) | 267,101 (625,898) |
|
1
Mean (SD)
|
||
#save file
decs_dtype_file = here("results", "decs-DT.Rds")
saveRDS(decs_dtype, file = decs_dtype_file)
#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$incidentType, report.nas = FALSE)
## Frequencies
## EDAdata$incidentType
## Type: Character
##
## Freq % % Cum.
## ---------------------- ------ -------- --------
## Biological 78 17.49 17.49
## Chemical 1 0.22 17.71
## Coastal Storm 4 0.90 18.61
## Dam/Levee Break 2 0.45 19.06
## Earthquake 5 1.12 20.18
## Fire 22 4.93 25.11
## Flood 75 16.82 41.93
## Hurricane 105 23.54 65.47
## Mud/Landslide 3 0.67 66.14
## Other 4 0.90 67.04
## Severe Ice Storm 12 2.69 69.73
## Severe Storm(s) 117 26.23 95.96
## Snow 5 1.12 97.09
## Terrorist 1 0.22 97.31
## Tornado 11 2.47 99.78
## Volcano 1 0.22 100.00
## Total 446 100.00 100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_IT <- EDAdata %>%
ggplot2::ggplot(aes(x = incidentType, y = OblAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
scale_y_continuous(labels = scales::dollar_format()) +
labs(x = "Incident Type",
y = "Obligated FEMA Funds",
title = "FEMA funds obligated for incident types") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
obl_fig_IT
#save file
obl_IT_fig_file = here("results","obl-IT.png")
ggsave(filename = obl_IT_fig_file, plot = obl_fig_IT)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_IT <- EDAdata %>%
ggplot2::ggplot(aes(x = incidentType, y = ReqAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
scale_y_continuous(labels = scales::dollar_format()) +
labs(x = "Incident Type",
y = "Requested FEMA Funds",
title = "FEMA funds requested for declaration types") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
req_fig_IT
#save file
req_IT_fig_file = here("results","req-IT.png")
ggsave(filename = req_IT_fig_file, plot = req_fig_IT)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_itype3 <- EDAdata %>%
dplyr::add_count(incidentType) %>%
dplyr::filter(dense_rank(-n) < 4) %>%
gtsummary::select(incidentType, ReqAmt, OblAmt) %>%
gtsummary::tbl_summary(
by = incidentType,
label = list(incidentType ~ "Type",
ReqAmt ~ "Requested Funds",
OblAmt ~ "Obligated Funds"),
statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
gtsummary::modify_header(label ~ "**Variable**") %>%
gtsummary::modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Type**") %>%
gtsummary::modify_caption("**Table 4. FEMA Funding for Declaration Types 2012 - 2021**") %>%
gtsummary::bold_labels()
decs_itype3
| Variable | Type | ||
|---|---|---|---|
| Biological, N = 781 | Hurricane, N = 1051 | Severe Storm(s), N = 1171 | |
| Requested Funds | 78,601,344 (119,067,029) | 57,089,787 (430,915,934) | 289,380 (902,722) |
| Obligated Funds | 78,607,144 (119,069,157) | 57,370,720 (432,833,622) | 289,950 (903,592) |
|
1
Mean (SD)
|
|||
#save file
decs_itype3_file = here("results", "decs-IT-3.Rds")
saveRDS(decs_itype3, file = decs_itype3_file)
#convert class to numeric
EDAdata$IncidentDuration <- as.numeric(EDAdata$IncidentDuration, units="days")
#summary statistics since it's continuous
base::summary(EDAdata$IncidentDuration)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 4.00 12.00 128.25 38.72 660.00
#NAs are ongoing events
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = IncidentDuration)) +
geom_density()
#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = IncidentDuration)) +
geom_boxplot()
##QQ plot
car::qqPlot(EDAdata$IncidentDuration)
## [1] 3 5
#examine graphical relationship with outcomes of interest
#look at total cost per incident month
req_fig_dur <- EDAdata %>%
ggplot2::ggplot(aes(x = IncidentDuration)) +
geom_point(aes(y = ReqAmt),
color = "tomato",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
labs(x = "Incident Duration (Days)",
y = "Funds ($)",
subtitle = "Requested")
req_fig_dur
## Warning: Removed 6 rows containing missing values (geom_point).
#repeat for obligated funds
obl_fig_dur <- EDAdata %>%
ggplot2::ggplot(aes(x = IncidentDuration)) +
geom_point(aes(y = OblAmt),
color = "turquoise4",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
labs(x = "Incident Duration (Days)",
y = "Funds ($)",
subtitle = "Obligated")
obl_fig_dur
## Warning: Removed 6 rows containing missing values (geom_point).
#combine into one plot
plot_col2 <- cowplot::plot_grid(req_fig_dur,
obl_fig_dur,
ncol = 1,
align = "v",
rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
plot_col2
title3 <- cowplot::ggdraw() +
draw_label(
"Requested vs obligated FEMA funds by Incident Duration (Under $1B)",
fontface = 'bold',
x = 0,
hjust = 0) +
theme( plot.margin = margin(0, 0, 0, 7))
funds_fig_dur <- cowplot::plot_grid(
title3, plot_col2,
ncol = 1,
# rel_heights values control vertical title margins
rel_heights = c(0.1, 1))
funds_fig_dur
#save file
funds_dur_fig_file = here("results","funds-dur.png")
ggsave(filename = funds_dur_fig_file, plot = funds_fig_dur)
## Saving 7 x 5 in image
#convert class to numeric
EDAdata$Population <- as.numeric(gsub(",", "", EDAdata$Population))
#summary statistics since it's continuous
base::summary(EDAdata$Population)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 576851 2961279 5024279 8644007 10439388 39538223
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = Population)) +
geom_density()
#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = Population)) +
geom_boxplot()
##QQ plot
car::qqPlot(EDAdata$Population)
## [1] 32 33
#examine graphical relationship with outcomes of interest
#look at total cost per pop
req_fig_pop <- EDAdata %>%
ggplot2::ggplot(aes(x = Population)) +
geom_point(aes(y = ReqAmt),
color = "tomato",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
scale_x_continuous(label = comma,
limits = c(0, 40000000)) +
labs(x = "State Population",
y = "Funds ($)",
subtitle = "Requested")
req_fig_pop
## Warning: Removed 6 rows containing missing values (geom_point).
#repeat for obligated funds
obl_fig_pop <- EDAdata %>%
ggplot2::ggplot(aes(x = Population)) +
geom_point(aes(y = OblAmt),
color = "turquoise4",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
scale_x_continuous(label = comma,
limits = c(0, 40000000)) +
labs(x = "State Population",
y = "Funds ($)",
subtitle = "Obligated")
obl_fig_pop
## Warning: Removed 6 rows containing missing values (geom_point).
#combine into one plot
plot_col3 <- cowplot::plot_grid(req_fig_pop,
obl_fig_pop,
ncol = 1,
align = "v",
rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
plot_col3
title4 <- cowplot::ggdraw() +
draw_label(
"Requested vs obligated FEMA funds by State Population (Under $1B)",
fontface = 'bold',
x = 0,
hjust = 0) +
theme( plot.margin = margin(0, 0, 0, 7))
funds_fig_pop <- cowplot::plot_grid(
title4, plot_col3,
ncol = 1,
# rel_heights values control vertical title margins
rel_heights = c(0.1, 1))
funds_fig_pop
#save file
funds_pop_fig_file = here("results","funds-pop.png")
ggsave(filename = funds_pop_fig_file, plot = funds_fig_pop)
## Saving 7 x 5 in image
#summary statistics since it's continuous
base::summary(EDAdata$TotalAgencies)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 2.000 4.908 7.000 55.000
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = TotalAgencies)) +
geom_density()
#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = TotalAgencies)) +
geom_boxplot()
##QQ plot
car::qqPlot(EDAdata$TotalAgencies)
## [1] 328 384
#examine graphical relationship with outcomes of interest
#look at total cost per incident month
req_fig_ag <- EDAdata %>%
ggplot2::ggplot(aes(x = TotalAgencies)) +
geom_point(aes(y = ReqAmt),
color = "tomato",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
labs(x = "Number of Federal Agencies",
y = "Funds ($)",
subtitle = "Requested")
req_fig_ag
## Warning: Removed 6 rows containing missing values (geom_point).
#repeat for obligated funds
obl_fig_ag <- EDAdata %>%
ggplot2::ggplot(aes(x = TotalAgencies)) +
geom_point(aes(y = OblAmt),
color = "turquoise4",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
labs(x = "Number of Federal Agencies",
y = "Funds ($)",
subtitle = "Obligated")
obl_fig_ag
## Warning: Removed 6 rows containing missing values (geom_point).
#combine into one plot
plot_col4 <- cowplot::plot_grid(req_fig_ag,
obl_fig_ag,
ncol = 1,
align = "v",
rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
plot_col4
title5 <- cowplot::ggdraw() +
draw_label(
"Requested vs obligated FEMA funds by number of federal agencies (Under $1B)",
fontface = 'bold',
x = 0,
hjust = 0) +
theme( plot.margin = margin(0, 0, 0, 7))
funds_fig_ag <- cowplot::plot_grid(
title5, plot_col4,
ncol = 1,
# rel_heights values control vertical title margins
rel_heights = c(0.1, 1))
funds_fig_ag
#save file
funds_ag_fig_file = here("results","funds-ag.png")
ggsave(filename = funds_ag_fig_file, plot = funds_fig_ag)
## Saving 7 x 5 in image
#summary statistics since it's continuous
base::summary(EDAdata$Counties)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 46.00 67.00 74.78 92.75 254.00
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = Counties)) +
geom_density()
#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = Counties)) +
geom_boxplot()
##QQ plot
car::qqPlot(EDAdata$Counties)
## [1] 372 373
#examine graphical relationship with outcomes of interest
#look at total cost per incident month
req_fig_count <- EDAdata %>%
ggplot2::ggplot(aes(x = Counties)) +
geom_point(aes(y = ReqAmt),
color = "tomato",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
labs(x = "Number of Counties",
y = "Funds ($)",
subtitle = "Requested")
req_fig_count
## Warning: Removed 6 rows containing missing values (geom_point).
#repeat for obligated funds
obl_fig_count <- EDAdata %>%
ggplot2::ggplot(aes(x = Counties)) +
geom_point(aes(y = OblAmt),
color = "turquoise4",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
labs(x = "Number of Counties",
y = "Funds ($)",
subtitle = "Obligated")
obl_fig_count
## Warning: Removed 6 rows containing missing values (geom_point).
#combine into one plot
plot_col5 <- cowplot::plot_grid(req_fig_count,
obl_fig_count,
ncol = 1,
align = "v",
rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
plot_col5
title6 <- cowplot::ggdraw() +
draw_label(
"Requested vs obligated FEMA funds by number of counties (Under $1B)",
fontface = 'bold',
x = 0,
hjust = 0) +
theme( plot.margin = margin(0, 0, 0, 7))
funds_fig_count <- cowplot::plot_grid(
title6, plot_col5,
ncol = 1,
# rel_heights values control vertical title margins
rel_heights = c(0.1, 1))
funds_fig_count
#save file
funds_count_fig_file = here("results","funds-count.png")
ggsave(filename = funds_count_fig_file, plot = funds_fig_count)
## Saving 7 x 5 in image
#convert to factor
EDAdata$IH <- as.factor(EDAdata$IH)
#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$IH, report.nas = FALSE)
## Frequencies
## EDAdata$IH
## Type: Factor
##
## Freq % % Cum.
## ----------- ------ -------- --------
## 0 267 59.87 59.87
## 1 179 40.13 100.00
## Total 446 100.00 100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_IH <- EDAdata %>%
ggplot2::ggplot(aes(x = IH, y = OblAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
labs(x = "IH Program",
y = "Obligated FEMA Funds",
title = "FEMA funds obligated when IH program is awarded") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
obl_fig_IH
#save file
obl_IH_fig_file = here("results","obl-IH.png")
ggsave(filename = obl_IH_fig_file, plot = obl_fig_IH)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_IH <- EDAdata %>%
ggplot2::ggplot(aes(x = IH, y = ReqAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
labs(x = "IH Program",
y = "Requested FEMA Funds",
title = "FEMA funds requested when IH program is awarded") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
req_fig_IH
#save file
req_IH_fig_file = here("results","req-IH.png")
ggsave(filename = req_IH_fig_file, plot = req_fig_IH)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_ih <- EDAdata %>%
gtsummary::select(IH, ReqAmt, OblAmt) %>%
gtsummary::tbl_summary(
by = IH,
label = list(IH ~ "Individuals / Households Program",
ReqAmt ~ "Requested Funds",
OblAmt ~ "Obligated Funds"),
statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
gtsummary::modify_header(label ~ "**Variable**") %>%
gtsummary::modify_header(update = list(
stat_1 ~ "**Not Awarded**",
stat_2 ~ "**Awarded**")) %>%
gtsummary::modify_spanning_header(c("stat_1", "stat_2") ~ "**Individuals/Households Program**") %>%
gtsummary::modify_caption("**Table 5. FEMA Funding by IH Program Awardance 2012 - 2021**") %>%
gtsummary::bold_labels()
decs_ih
| Variable | Individuals/Households Program | |
|---|---|---|
| Not Awarded1 | Awarded1 | |
| Requested Funds | 293,100 (784,973) | 73,498,363 (341,233,996) |
| Obligated Funds | 293,100 (784,973) | 73,673,828 (342,639,903) |
|
1
Mean (SD)
|
||
#save file
decs_ih_file = here("results", "decs-IH.Rds")
saveRDS(decs_ih, file = decs_ih_file)
#convert to factor
EDAdata$PA <- as.factor(EDAdata$PA)
#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$PA, report.nas = FALSE)
## Frequencies
## EDAdata$PA
## Type: Factor
##
## Freq % % Cum.
## ----------- ------ -------- --------
## 0 12 2.69 2.69
## 1 434 97.31 100.00
## Total 446 100.00 100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_PA <- EDAdata %>%
ggplot2::ggplot(aes(x = PA, y = OblAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
labs(x = "PA Program",
y = "Obligated FEMA Funds",
title = "FEMA funds obligated when PA program is awarded") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
obl_fig_PA
#save file
obl_PA_fig_file = here("results","obl-PA.png")
ggsave(filename = obl_PA_fig_file, plot = obl_fig_PA)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_PA <- EDAdata %>%
ggplot2::ggplot(aes(x = PA, y = ReqAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
labs(x = "PA Program",
y = "Requested FEMA Funds",
title = "FEMA funds requested when PA program is awarded") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
req_fig_PA
#save file
req_PA_fig_file = here("results","req-PA.png")
ggsave(filename = req_PA_fig_file, plot = req_fig_PA)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_pa <- EDAdata %>%
gtsummary::select(PA, ReqAmt, OblAmt) %>%
gtsummary::tbl_summary(
by = PA,
label = list(PA ~ "Public Assistance Program",
ReqAmt ~ "Requested Funds",
OblAmt ~ "Obligated Funds"),
statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
gtsummary::modify_header(label ~ "**Variable**") %>%
gtsummary::modify_header(update = list(
stat_1 ~ "**Not Awarded**",
stat_2 ~ "**Awarded**")) %>%
gtsummary::modify_spanning_header(c("stat_1", "stat_2") ~ "**Public Assistance Program**") %>%
gtsummary::modify_caption("**Table 6. FEMA Funding by PA Program Awardance 2012 - 2021**") %>%
gtsummary::bold_labels()
decs_pa
| Variable | Public Assistance Program | |
|---|---|---|
| Not Awarded1 | Awarded1 | |
| Requested Funds | 162,348 (355,421) | 30,489,669 (221,740,562) |
| Obligated Funds | 162,348 (355,421) | 30,562,038 (222,644,037) |
|
1
Mean (SD)
|
||
#save file
decs_pa_file = here("results", "decs-PA.Rds")
saveRDS(decs_pa, file = decs_pa_file)
#convert to factor
EDAdata$HM <- as.factor(EDAdata$HM)
#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$HM, report.nas = FALSE)
## Frequencies
## EDAdata$HM
## Type: Factor
##
## Freq % % Cum.
## ----------- ------ -------- --------
## 0 117 26.23 26.23
## 1 329 73.77 100.00
## Total 446 100.00 100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_HM <- EDAdata %>%
ggplot2::ggplot(aes(x = HM, y = OblAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
labs(x = "HM Program",
y = "Obligated FEMA Funds",
title = "FEMA funds obligated when HM program is awarded") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
obl_fig_HM
#save file
obl_HM_fig_file = here("results","obl-HM.png")
ggsave(filename = obl_HM_fig_file, plot = obl_fig_HM)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_HM <- EDAdata %>%
ggplot2::ggplot(aes(x = HM, y = ReqAmt)) +
geom_boxplot() +
geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
labs(x = "HM Program",
y = "Requested FEMA Funds",
title = "FEMA funds requested when HM program is awarded") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
req_fig_HM
#save file
req_HM_fig_file = here("results","req-HM.png")
ggsave(filename = req_HM_fig_file, plot = req_fig_HM)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_hm <- EDAdata %>%
gtsummary::select(HM, ReqAmt, OblAmt) %>%
gtsummary::tbl_summary(
by = HM,
label = list(HM ~ "Hazard Mitigation Program",
ReqAmt ~ "Requested Funds",
OblAmt ~ "Obligated Funds"),
statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
gtsummary::modify_header(label ~ "**Variable**") %>%
gtsummary::modify_header(update = list(
stat_1 ~ "**Not Awarded**",
stat_2 ~ "**Awarded**")) %>%
gtsummary::modify_spanning_header(c("stat_1", "stat_2") ~ "**Hazard Mitigation Program**") %>%
gtsummary::modify_caption("**Table 7. FEMA Funding by HM Program Awardance 2012 - 2021**") %>%
gtsummary::bold_labels()
decs_hm
| Variable | Hazard Mitigation Program | |
|---|---|---|
| Not Awarded1 | Awarded1 | |
| Requested Funds | 264,949 (623,629) | 40,132,114 (254,014,380) |
| Obligated Funds | 264,949 (623,629) | 40,227,580 (255,051,902) |
|
1
Mean (SD)
|
||
#save file
decs_hm_file = here("results", "decs-HM.Rds")
saveRDS(decs_hm, file = decs_hm_file)
#summary statistics since it's continuous
base::summary(EDAdata$ResponseDays)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 70.04 660.00 1098.00 1332.20 2018.23 3588.00
#NAs are ongoing events
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = ResponseDays)) +
geom_density()
#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = ResponseDays)) +
geom_boxplot()
##QQ plot
car::qqPlot(EDAdata$ResponseDays)
## [1] 413 145
#examine graphical relationship with outcomes of interest
req_fig_resp <- EDAdata %>%
ggplot2::ggplot(aes(x = ResponseDays)) +
geom_point(aes(y = ReqAmt),
color = "tomato",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
labs(x = "Response Duration (Days)",
y = "Funds ($)",
subtitle = "Requested")
req_fig_resp
## Warning: Removed 6 rows containing missing values (geom_point).
#repeat for obligated funds
obl_fig_resp <- EDAdata %>%
ggplot2::ggplot(aes(x = ResponseDays)) +
geom_point(aes(y = OblAmt),
color = "turquoise4",
show.legend = FALSE) +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1000000000),
labels = scales::dollar_format()) +
labs(x = "Response Duration (Days)",
y = "Funds ($)",
subtitle = "Obligated")
obl_fig_resp
## Warning: Removed 6 rows containing missing values (geom_point).
#combine into one plot
plot_col7 <- cowplot::plot_grid(req_fig_resp,
obl_fig_resp,
ncol = 1,
align = "v",
rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
plot_col7
title8 <- cowplot::ggdraw() +
draw_label(
"Requested vs obligated FEMA funds by response duration (Under $1B)",
fontface = 'bold',
x = 0,
hjust = 0) +
theme( plot.margin = margin(0, 0, 0, 7))
funds_fig_resp <- cowplot::plot_grid(
title8, plot_col7,
ncol = 1,
# rel_heights values control vertical title margins
rel_heights = c(0.1, 1))
funds_fig_resp
#save file
funds_resp_fig_file = here("results","funds-resp.png")
ggsave(filename = funds_resp_fig_file, plot = funds_fig_resp)
## Saving 7 x 5 in image
Make the summary table for the manuscript. Variables to include:
#label factors
EDAdata$IH <- factor(EDAdata$IH,
levels = c(0, 1),
labels = c("Not Awarded", "Awarded"))
EDAdata$PA <- factor(EDAdata$PA,
levels = c(0, 1),
labels = c("Not Awarded", "Awarded"))
EDAdata$HM <- factor(EDAdata$HM,
levels = c(0, 1),
labels = c("Not Awarded", "Awarded"))
#specify table labels
table1::label(EDAdata$state) <- "State"
table1::label(EDAdata$IncidentYear) <- "Incident Year"
table1::label(EDAdata$IncidentMonth) <- "Declaration Month"
table1::label(EDAdata$declarationType) <- "Declaration Type"
table1::label(EDAdata$incidentType) <- "Incident Type"
table1::label(EDAdata$IncidentDuration) <- "Incident Duration"
table1::label(EDAdata$ResponseDays) <- "Response Duration"
table1::label(EDAdata$Population) <- "State Population"
table1::label(EDAdata$Counties) <- "Counties per State"
table1::label(EDAdata$TotalAgencies) <- "Federal Agencies Involved"
table1::label(EDAdata$FEMACSAvg) <- "Average FEMA Cost Share"
table1::label(EDAdata$IH) <- "Individuals/Households Program"
table1::label(EDAdata$PA) <- "Public Assistance Program"
table1::label(EDAdata$HM) <- "Hazard Mitigation Program"
table1::label(EDAdata$ReqAmt) <- "Requested FEMA Funds"
table1::label(EDAdata$OblAmt) <- "Obligated FEMA Funds"
#specify units
table1::units(EDAdata$IncidentDuration) <- "days"
table1::units(EDAdata$ResponseDays) <- "days"
#create table
table1 <- table1::table1(~ ReqAmt + OblAmt + state + IncidentYear + IncidentMonth + declarationType + incidentType +
IncidentDuration + ResponseDays + IH + PA + HM + Population + Counties + TotalAgencies + FEMACSAvg,
data = EDAdata)
#save file
table1_file = here("results", "table1.Rds")
saveRDS(table1, file = table1_file)
Since we’ve made some changes to the format of the processed data, let’s save it for the analysis.
#location to save processed file
analysis_data_location <- here::here("data","processed_data","analysisdata.rds")
#save as an RDS
saveRDS(EDAdata, file = analysis_data_location)